OcrV1, Main, Exploration, bibRecord, 001B57

Chinese text distinction and font identification by recognizing most frequently used characters

Identifieur interne : 001B57 ( Main/Exploration ); précédent : 001B56; suivant : 001B58

Chinese text distinction and font identification by recognizing most frequently used characters

Auteurs : Chi-Fang Lin [Taïwan, République populaire de Chine] ; Yu-Fan Fang [République populaire de Chine] ; Yau-Tarng Juang [République populaire de Chine]

Source :

Image and Vision Computing [ 0262-8856 ] ; 2000.

RBID : ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3

English descriptors

KwdEn :
- Character recognition, Feature extraction, Font identification, Template matching, Text distinction.

Abstract

In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.

Url:

https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf

DOI: 10.1016/S0262-8856(00)00082-2

Affiliations:

République populaire de Chine, Taïwan

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000057
to stream Istex, to step Curation: 000056
to stream Istex, to step Checkpoint: 001211
to stream Main, to step Merge: 001C50
to stream Main, to step Curation: 001B57

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0262-8856(00)00082-2</idno>
<idno type="url">https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000057</idno>
<idno type="wicri:Area/Istex/Curation">000056</idno>
<idno type="wicri:Area/Istex/Checkpoint">001211</idno>
<idno type="wicri:doubleKey">0262-8856:2001:Lin C:chinese:text:distinction</idno>
<idno type="wicri:Area/Main/Merge">001C50</idno>
<idno type="wicri:Area/Main/Curation">001B57</idno>
<idno type="wicri:Area/Main/Exploration">001B57</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
<affiliation wicri:level="1"><country wicri:rule="url">Taïwan</country>
</affiliation>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="329">329</biblScope>
<biblScope unit="page" to="338">338</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<idno type="DOI">10.1016/S0262-8856(00)00082-2</idno>
<idno type="PII">S0262-8856(00)00082-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Feature extraction</term>
<term>Font identification</term>
<term>Template matching</term>
<term>Text distinction</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
<li>Taïwan</li>
</country>
</list>
<tree><country name="Taïwan"><noRegion><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</noRegion>
</country>
<country name="République populaire de Chine"><noRegion><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</noRegion>
<name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001B57 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001B57 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3
   |texte=   Chinese text distinction and font identification by recognizing most frequently used characters
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Chinese text distinction and font identification by recognizing most frequently used characters

Chinese text distinction and font identification by recognizing most frequently used characters

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri